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Existing Genetic Algoritlims for crystal structure and polymorpli prediction can suffer from stagnation during 
evolution, with a consequent loss of efficiency and accuracy. An improved Genetic Algorithm (GA) is intro- 
duced herein which penalizes similar structures and so enhances structural diversity in the population at each 
generation. This is shown to improve the quality of results found for the theoretical prediction of simple model 
crystal structures. In particular, this method is demonstrated to find three new zero-temperature phases of the 
Dzugutov potential that have not been previously reported. 
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I. INTRODUCTION 

Genetic algorithms (GAs) are emerging as a useful tool in 
the theoretical prediction of crystal structures (see Abraham 
and Probert and references therein)^'^^ "*. During a GA calcu- 
lation it is possible that the system will stagnate. When stag- 
nation occurs, one or more local minima dominate the search 
and the method is unable to find the global minimum solu- 
tion. In this communication we improve the convergence to 
the global minimum solution of the CASTEP-GA' through 
the use of a fitness function which is able to differentiate struc- 
tures during the course of a GA minimization. 

Binary-encoded GAs such as the method of Hart et al.— , 
Blum et al.— are able to directly compare the binary strings 
that make up their population members and determine if two 
populations members are the same. In this way it is possi- 
ble to remove any highly prevalent local minimum from the 
population, and prevent its creation in future mating opera- 
tions. While this method is not possible in the frame-work of 
the CASTEP-GA, we have developed an alternative approach 
that significantly reduces the stagnation rate and also forces 
the system to explore new minima. This alternative approach 
is also broadly transferable to a wide range of other GAs. 



II. METHOD 

Our GA method' is a real-space encoded technique for 
crystal structure prediction which takes advantage of the pe- 
riodicity of the simulation supercell to improve the speed and 
accuracy of convergence to the global minimum free-energy 
crystal structure. There is a population of structures (or mem- 
bers) which are bred together to produce new members, such 
that with each subsequent generation the population evolves 
in an attempt to determine the global minimum structure. The 
fitness function of the GA is used to determine how good 
("fit") a structure is and this is then used to weight the prob- 
ability of survival of that structure and its probability to pro- 
duce offspring. 

While this method has been very successful in the past, 
we wanted to reduce the stagnation rate and thereby improve 
the quality of the solutions produced during a GA structure 
search. Since this is a real-space based approach it is not pos- 
sible to directly compare the atomic co-ordinates of two pop- 



ulation members to determine if they are the same structure. 
In our previous work', the enthalpy of the structure was used 
to calculate the fitness. In this work, we propose augment- 
ing this fitness function with an additional function which is 
able to determine the similarity of two structures. We shall 
illustrate the effectiveness of this new approach by first study- 
ing the Lennard-Jones crystals for comparison with our previ- 
ous results and then the high pressure phases of the Dzugutov 
potential^. 

The enthalpy-based fitness function is 



[1 - tanh(2p., - 1)] 



with the variable pi being defined by 



(1) 



(2) 



where Vmax is the enthalpy of the highest enthalpy member 
of the population, Vmin is the enthalpy of the lowest enthalpy 
member and Vi is the enthalpy of the member i being consid- 
ered. The fitness of each member i is fi and this is a function 
which varies between zero and one. Population members with 
a fitness close to zero are less "fit", and members with a fit- 
ness close to one are more "fit". Population members are then 
selected (using roulette-wheel selection) for reproduction or 
are removed from the population based on this fitness value. 

This should mean that only fit members are selected to re- 
main in the population, or are allowed to breed {crossover). It 
is often very likely that during the course of a calculation mul- 
tiple copies of population members are made. In a bit-string 
represented GA duplicate members are very easy to spot but in 
a real-space encoded GA it is very hard to tell if two members 
are the same during the course of a calculation, since the crys- 
tal structure may be orientated or translated in any way within 
the simulation cell (due to use of periodic boundary condi- 
tions). This is even harder if the simulation cell parameters 
are also allowed to vary during the course of a calculation. 

Hence we need a simple measure of structural similarity 
so that we can detect when duplicate structures exist within a 
population. Whilst this is encouraging from the point of view 
of ultimate structural convergence, in the early stages of the 
GA minimization we want to ensure as much structural diver- 
sity as possible to enable a broad search of possible solutions 
and so we want to penalise similar structures. 
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Since we are using this routine to differentiate between like 
and unlike structures, rather than any form of comprehensive 
structural analysis, we can simplify this comparison some- 
what. If we are performing a calculation in which we allow 
the number of atoms to vary, then we can make an educated 
guess that two structures with different numbers of atoms are 
different (or rather in this case any offspring produced in the 
crossover procedure will have a greater number of degrees of 
freedom to explore the potential energy surface), so we will 
have no need to compare these structures. We also do not 
need to compare each structure with all other structures, since 
we are merely trying to prevent stagnation rather than give a 
definitive structural comparison, and so we can simply com- 
pare all structures with the minimum enthalpy structure which 
has the same number of atoms as itself that exist in the current 
generation. We will define a comparison function between 
structures which returns zero if the structures are the same and 
one if the structures are suitably dissimilar. We also want to 
keep the fact that lower enthalpy structures are "better" than 
higher enthalpy ones, so we further weight the value that any 
given structure has by the value of fi of the fittest member in 
that "set" which is made up of members with the same num- 
ber of atoms. Here we define our improved fitness function 
as 

^ = (i--)^+-^-*{i?(4.)) 55/^^ (3) 

where the left-superscript j above denotes comparing be- 
tween groups with the same number of atoms only, fi is as 
defined in equation [1] w is a weighting value between zero 
and one and R (A (fc, )) is a function which compares mem- 
ber i of the set of atoms j with the fittest member in that set (as 
defined by equation[T]i. This means that the fitness of the fittest 
member of each group ( •'ffit ) will be unchanged from its en- 
thalpy value, and all other values in the group will be scaled 
accordingly. If the value of the fitness weight, w, is set to 1 
then the maximum value of ^fl that any member could have is 
the same value of the fittest member of the group, ^ffu- If w is 
set to zero then this function reduces to that given in equation 
[T] The comparison function R (A (fcr)) is 
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Figure 1: (Color online) Comparision of the Lennard- Jones and 
Dzugutov pair-potentials. 

m A c a B d b 
16 5.82 1.1 1.87 1.28 0.27 1.94 

Table I: Table of parameters used in the Dzugutov potential (equation 

of r„, and Jq (r) is a Bessel function. The function A (fcr) of 
a population member i of each group j is tested against the 
function A' (fc^) of the fittest member in the group j contain- 
ing the same number of atoms as member i. Equation |4] is 
then used to compare these two functions and returns a single 
number between zero and one. In a variable-supercell calcu- 
lation it is possible for this function to become greater than 
one when the structures are highly dissimilar, in which case 
we set the value of R (A (fcr)) to one. 

ni. RESULTS 

The results presented here will use two different empirical 
potentials, the Lennard-Jones potential^^ and the Dzugutov 
potential^ which is defined as 

$(i?„) =$l(i?„)+$2(i?y) (6) 

where 



i?(A(fcr)) 



E,JA-(fc.)-A(fcr) 
Efc A'(fcr) 



(4) 



where consideration of the spherically averaged scattering in- 
tensity leads to 



A (fcr) = 



N 



(5) 



n=l 
N N 



2 ^ ^ (n) p'2 (m) Jo (\/37rfcr |r„ - r„ 



n— 1 myn 



which is positive-definite (and is based on the Debye scat- 
tering formula**) . In equation |5] there are N ions within the 
simulation cell which has a volume 51, p' (n) is the scattering 
factor of ion n which has the atomic real-space co-ordinate 



$1 = 




- <^^P {r^) < « (7) 



Sexp(^) 




Rtj < b 
R,j > b 



(8) 



with the constants defined in table |I] A comparison of these 
two potentials is shown in figure [T] The Dzugutov potential 
was originally formulated to simulate liquid systems, however 
it has also been shown to have some interesting solid phasesii, 
and can also be used to form quasi-crystals^. 

The Dzugutov potential is designed such that the force on, 
and energy of, an atom moving within the potential go to zero 
at b/a. As is reported in Roth and Denton'' the Dzugutov 
potential has three known stable phases at varying pressures: 
BCC, the CT-phase and FCC. 
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Figure 2: (Color online) Summary of the enthalpies of the different 
Lennard-Jones structures found for different fitness weights, which 
controls how much the comparison factor is considered during selec- 
tion for update and crossover The values for ui = 0.0 are those from 
Abraham and Probert'. All points are averaged over 15 independent 
calculations. 
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Figure 3: (Color online) Summary of the convergence times for the 
results shown in figure |2] The values for ui = 0.0 are those from 
Abraham and Probert'. All points are averaged over 15 independent 
calculations. 
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Figure 4: (Color online) Plot showing convergence to HCP mini- 
mum structure for a Lennard-Jones calculation with w — 0.75. The 
stacking patterns of the minimum enthalpy solutions are shown next 
to their appearance during the course of the simulation. The system 
converged to a HCP structure in 55 generations, and by the 127^^ 
generation all members were the same. 
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Table II: Table comparing the number of each ordered structure type 
of the lowest enthalpy structure found (i.e. ignoring higher-enthalpy 
structures found during the course of a GA minimization) for differ- 
ent values of the fitness weighting factor w. Numbers given are out 
of a total of 15 calculations. 



A. Results from tlie Lennard-Jones potential 

The use of the comparison factor in the selection procedure 
has a marked effect on the quality of the results produced as 
seen in figure|2l While the global minimum structure is hexag- 
onal close packed (HCP) this structure is nearly degenerate 
with the face-centred cubic structure (FCC) (with an energy 
difference of less than 0.1 %i^). There are also a number of 
other stacking-fault structures which exist in-between FCC 
and HCP. The use of the comparison factor encourages the 
system to explore and hence escape from these local minima 
and find the HCP structure. With a fitness weight of u> = 0.75 
finding a HCP structure is much more likely. 

The effect on convergence is interesting as seen in figure 
|3] There is little increase in the mean number of generations 
required for convergence, although there is a greater spread in 
the values. 

Figure |4] shows the results from a calculation performed 
with w = 0.75. We have included these results in particu- 
lar because it shows the system going from an FCC structure 
to a HCP structure through two intermediate stacking-fault 



structures. 

These results are summarized in table |II] The results from 
w = 0.00 are those presented in Abraham and Probert—. 



B. Results from the Dzugutov potential 

For results obtained using this potential an additional mod- 
ification was made to the GA in the crossover step. Previously 
the atom-number could either be kept fixed or be allowed to 
vary in an unconstrained manner. For these Dzugutov calcu- 
lations a third option was added, which is to allow the atom 
number to vary within an allowed percentage of the original 
number of atoms within the simulation supercell. 

While this is not necessary in a fixed-cell size/shape calcu- 
lation, for a variable-cell calculation it is essential. Without 
this constraint it would be possible for the number of atoms to 
keep decreasing with the cell getting smaller and smaller un- 
til the minimum image convention is violated, at which point 
the calculation will stop. It might also allow a calculation to 
keep adding atoms at the crossover stage and then allow the 
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Figure 5: The unit cell of the Dzugutov potential cr-phase looking 
down the [001] direction. 
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* Data taken from Roth and Denton ' ' . 

* Where the term "Lower Enthalpy" refers to having lower enthalpy 
than the phase in column 2. 

Table III: Summary of results for 62-atom variable-cell, constrained 
variable-atom-number calculations. 22 independent GA calcula- 
tions were performed at each pressure. 



cell to grow to accommodate them. In this way the calculation 
would increase in size and take a longer and longer time for 
each minimization step. This percentage cut-off keeps the ad- 
vantages of a variable atom-number calculation without these 
problems. 

It is already known that the Dzugutov cr-phase has a com- 
plicated 30-atom unit cell (see figure|5]l and so all calculations 
had to have at least this many atoms. To prevent any bias of 
the final results, we started each run with 62 atoms in the unit 
cell and allowed the number of atoms to vary, in order to have 
an unbiased search of a large enough phase space. 

A summary of the Dzugatov results is given in table 
Hm Calculations were performed at four pressures, OMPa, 
50 MPa, 100 MPa and 150 MPa which allows each of the three 
structures suggested by Roth and Denton— to be the most sta- 
ble at at least one point during the experiment. 

As can be seen in table|III]a number of GA minimizations 
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Figure 6: (Color online) Convergence plot of a variable-atom, 
variable-cell calculation, starting from 62 atoms. This gives rise to 
a previously unknown phase (labeled as phase "a" in figure[8]l. The 
inset shows the complete calculation. The minimum-enthalpy struc- 
ture found has 65-atoms and is shown in figure |7] 

• •••• 

• •••• 

• •••• 

Figure 7: The structure of the new phase "a", a 65-atom phase found 
in generation 64 of the calculation shown in figure[6] 



found structures with a lower enthalpy than the previously 
reported minimum enthalpy structure. In total three distinct 
new structures were found. A plot showing the progress of a 
GA minimization down to the new lowest-enthalpy structure 
found is shown in figure |6] with the structure itself shown in 
figure |7] 

A plot comparing the radial distribution functions of all 
the known and unknown phases of the Dzugutov potential 
is shown in figure [8] as well as the HCP phase. The three 
new phases found are all significantly different from the es- 
tablished phases of this potential. Examination of these phases 
suggests that the simulation cells correspond to primitive cells 
and not supercells, but attempts to further characterize these 
structures by space group have so far been unsuccessful. The 
atomic co-ordinates of these structures are available online^^ 
as supplementary material. 

A plot showing the energy-volume curves for the six phases 
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Figure 8: (Color online) Comparison of the radial distribution func- 
tion, g (r), for the distinct lower-enthalpy structures found with 
BCC, FCC, HCP and the cr-phase. The Dzugutov Potential is also 
shown. 

of the Dzugutov potential found in the course of this study is 
shown in figure |9] Phase "a" is the most stable phase at all 
positive pressures. 

IV. CONCLUSIONS 

In this paper we have developed a novel fitness function 
that combines a traditional approach to fitness based upon en- 



thalpy, with a simple structural comparison factor to find new, 
more stable crystal structures within a GA for crystal struc- 
ture prediction. This method penalises the presence of similar 
structures within the population which prevents the GA stag- 
nating in some local minimum. The GA method itself was 
also extended to allow both the simulation supercell and the 
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Figure 9: (Color online) Energy- Volume curve for the Dzugutov po- 
tential showing the three new phases calculated at zero pressure. The 
curves for the a-phase and structures "a", "6" and "c" were calcu- 
lated assuming an isotropic expansion. 



number of atoms within that supercell to vary. The number of 
atoms must only be varied within fixed limits to prevent the 
system size becoming too large or too small. 

Studies using the Lennard-Jones potential showed the cal- 
culation progressing through the FCC local minimum and 
two other stacking-fault local minima before finding the HCP 
global minimum enthalpy structure. This was shown to be 
repeatable and efficient. 

When this new GA was used to study phases of the 
Dzugutov potential at different pressures all the previously re- 
ported zero-temperature phases were found, along with three 
new phases, one of which is the most stable phase at all pos- 
itive pressures. These new structures are markedly different 
from the three previously-known phases. This clearly illus- 
trates the power of this GA to find new crystal structures that 
were hitherto unexpected. 
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